Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nvidia-l4 gpu accelerator #2608

Merged
merged 1 commit into from
Jul 26, 2024
Merged

Conversation

eapolinario
Copy link
Collaborator

Why are the changes needed?

The suffix -vws in the definition of the L4 gpu accelerator causes some confusion. According to google cloud docs, we can specify either nvidia-l4 or nvidia-l4-vws to use L4 GPUs.

What changes were proposed in this pull request?

Rename the existing L4 constant to L4_VWS just to be safe and define a new L4 gpu accelerator with the value set to nvidia-l4.

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Signed-off-by: Eduardo Apolinario <[email protected]>
@@ -133,7 +133,11 @@ def to_flyte_idl(self) -> tasks_pb2.GPUAccelerator:

#: use this constant to specify that the task should run on an
#: `NVIDIA L4 Tensor Core GPU <https://www.nvidia.com/en-us/data-center/l4/>`_
L4 = GPUAccelerator("nvidia-l4-vws")
L4 = GPUAccelerator("nvidia-l4")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhotan Do you think changing the id can be BC breaking?

GCP is okay with both strings, but is this the same for AWS?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, we've never tested this gpu accelerator code in aws (we couldn't get those gpus at the time). Looking at the code this should work, but it requires verification.

Copy link
Contributor

@samhita-alla samhita-alla Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eapolinario i've used node_selector={"k8s.amazonaws.com/accelerator": "nvidia-tesla-l4"}/GPUAccelerator("nvidia-tesla-l4") to get l4 to work on aws. we've l4 enabled on the demo hosted tenant.

@eapolinario
Copy link
Collaborator Author

@thomasjpfan , I also noticed that this change is going to break this unionai test: https://github.com/unionai/unionai/blob/main/tests/unit/actor/test_actor.py#L119

@thomasjpfan
Copy link
Member

I also noticed that this change is going to break this unionai test:

@eapolinario I'll adjust the unit tests after this is merged.

@eapolinario eapolinario merged commit fd0634e into master Jul 26, 2024
45 of 47 checks passed
Mecoli1219 pushed a commit to Mecoli1219/flytekit that referenced this pull request Jul 27, 2024
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
mao3267 pushed a commit to mao3267/flytekit that referenced this pull request Aug 1, 2024
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
mao3267 pushed a commit to mao3267/flytekit that referenced this pull request Aug 2, 2024
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: mao3267 <[email protected]>
Future-Outlier added a commit that referenced this pull request Aug 26, 2024
…class] (#2603)

* fix: set dataclass member as optional if default value is provided

Signed-off-by: mao3267 <[email protected]>

* lint

Signed-off-by: mao3267 <[email protected]>

* feat: handle nested dataclass conversion in JsonParamType

Signed-off-by: mao3267 <[email protected]>

* fix: handle errors caused by NoneType default value

Signed-off-by: mao3267 <[email protected]>

* test: add nested dataclass unit tests

Signed-off-by: mao3267 <[email protected]>

* Sagemaker dict determinism (#2597)

* truncate sagemaker agent outputs

Signed-off-by: Samhita Alla <[email protected]>

* fix tests and update agent output

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* fix test

Signed-off-by: Samhita Alla <[email protected]>

* add idempotence token to workflow

Signed-off-by: Samhita Alla <[email protected]>

* fix type

Signed-off-by: Samhita Alla <[email protected]>

* fix mixin

Signed-off-by: Samhita Alla <[email protected]>

* modify output handler

Signed-off-by: Samhita Alla <[email protected]>

* make the dictionary deterministic

Signed-off-by: Samhita Alla <[email protected]>

* nit

Signed-off-by: Samhita Alla <[email protected]>

---------

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* refactor(core): Enhance return type extraction logic (#2598)

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Feat: Make exception raised by external command authenticator more actionable (#2594)

Signed-off-by: Fabio Grätz <[email protected]>
Co-authored-by: Fabio Grätz <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Fix: Properly re-raise non-grpc exceptions during refreshing of proxy-auth credentials in auth interceptor (#2591)

Signed-off-by: Fabio Grätz <[email protected]>
Co-authored-by: Fabio Grätz <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* validate idempotence token length in subsequent tasks (#2604)

* validate idempotence token length in subsequent tasks

Signed-off-by: Samhita Alla <[email protected]>

* remove redundant param

Signed-off-by: Samhita Alla <[email protected]>

* add tests

Signed-off-by: Samhita Alla <[email protected]>

---------

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Add nvidia-l4 gpu accelerator (#2608)

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* eliminate redundant literal conversion for `Iterator[JSON]` type (#2602)

* eliminate redundant literal conversion for  type

Signed-off-by: Samhita Alla <[email protected]>

* add test

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* add isclass check

Signed-off-by: Samhita Alla <[email protected]>

---------

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* [FlyteSchema] Fix numpy problems (#2619)

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* add nim plugin (#2475)

* add nim plugin

Signed-off-by: Samhita Alla <[email protected]>

* move nim to inference

Signed-off-by: Samhita Alla <[email protected]>

* import fix

Signed-off-by: Samhita Alla <[email protected]>

* fix port

Signed-off-by: Samhita Alla <[email protected]>

* add pod_template method

Signed-off-by: Samhita Alla <[email protected]>

* add containers

Signed-off-by: Samhita Alla <[email protected]>

* update

Signed-off-by: Samhita Alla <[email protected]>

* clean up

Signed-off-by: Samhita Alla <[email protected]>

* remove cloud import

Signed-off-by: Samhita Alla <[email protected]>

* fix extra config

Signed-off-by: Samhita Alla <[email protected]>

* remove decorator

Signed-off-by: Samhita Alla <[email protected]>

* add tests, update readme

Signed-off-by: Samhita Alla <[email protected]>

* add env

Signed-off-by: Samhita Alla <[email protected]>

* add support for lora adapter

Signed-off-by: Samhita Alla <[email protected]>

* minor fixes

Signed-off-by: Samhita Alla <[email protected]>

* add startup probe

Signed-off-by: Samhita Alla <[email protected]>

* increase failure threshold

Signed-off-by: Samhita Alla <[email protected]>

* remove ngc secret group

Signed-off-by: Samhita Alla <[email protected]>

* move plugin to flytekit core

Signed-off-by: Samhita Alla <[email protected]>

* fix docs

Signed-off-by: Samhita Alla <[email protected]>

* remove hf group

Signed-off-by: Samhita Alla <[email protected]>

* modify podtemplate import

Signed-off-by: Samhita Alla <[email protected]>

* fix import

Signed-off-by: Samhita Alla <[email protected]>

* fix ngc api key

Signed-off-by: Samhita Alla <[email protected]>

* fix tests

Signed-off-by: Samhita Alla <[email protected]>

* fix formatting

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* docs fix

Signed-off-by: Samhita Alla <[email protected]>

* docs fix

Signed-off-by: Samhita Alla <[email protected]>

* update secrets interface

Signed-off-by: Samhita Alla <[email protected]>

* add secret prefix

Signed-off-by: Samhita Alla <[email protected]>

* fix tests

Signed-off-by: Samhita Alla <[email protected]>

* add urls

Signed-off-by: Samhita Alla <[email protected]>

* add urls

Signed-off-by: Samhita Alla <[email protected]>

* remove urls

Signed-off-by: Samhita Alla <[email protected]>

* minor modifications

Signed-off-by: Samhita Alla <[email protected]>

* remove secrets prefix; add failure threshold

Signed-off-by: Samhita Alla <[email protected]>

* add hard-coded prefix

Signed-off-by: Samhita Alla <[email protected]>

* add comment

Signed-off-by: Samhita Alla <[email protected]>

* make secrets prefix a required param

Signed-off-by: Samhita Alla <[email protected]>

* move nim to flytekit plugin

Signed-off-by: Samhita Alla <[email protected]>

* update readme

Signed-off-by: Samhita Alla <[email protected]>

* update readme

Signed-off-by: Samhita Alla <[email protected]>

* update readme

Signed-off-by: Samhita Alla <[email protected]>

---------

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* [Elastic/Artifacts] Pass through model card (#2575)

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Remove pyarrow as a direct dependency (#2228)

Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Boolean flag to show local container logs to the terminal (#2521)

Signed-off-by: aditya7302 <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Enable Ray Fast Register (#2606)

Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* [Artifacts/Elastic] Skip partitions (#2620)

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Install flyteidl from master in plugins tests (#2621)

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Using ParamSpec to show underlying typehinting (#2617)

Signed-off-by: JackUrb <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Support ArrayNode mapping over Launch Plans (#2480)

* set up array node

Signed-off-by: Paul Dittamo <[email protected]>

* wip array node task wrapper

Signed-off-by: Paul Dittamo <[email protected]>

* support function like callability

Signed-off-by: Paul Dittamo <[email protected]>

* temp check in some progress on python func wrapper

Signed-off-by: Paul Dittamo <[email protected]>

* only support launch plans in new array node class for now

Signed-off-by: Paul Dittamo <[email protected]>

* add map task array node implementation wrapper

Signed-off-by: Paul Dittamo <[email protected]>

* ArrayNode only supports LPs for now

Signed-off-by: Paul Dittamo <[email protected]>

* support local execute for new array node implementation

Signed-off-by: Paul Dittamo <[email protected]>

* add local execute unit tests for array node

Signed-off-by: Paul Dittamo <[email protected]>

* set exeucution version in array node spec

Signed-off-by: Paul Dittamo <[email protected]>

* check input types for local execute

Signed-off-by: Paul Dittamo <[email protected]>

* remove code that is un-needed for now

Signed-off-by: Paul Dittamo <[email protected]>

* clean up array node class

Signed-off-by: Paul Dittamo <[email protected]>

* improve naming

Signed-off-by: Paul Dittamo <[email protected]>

* clean up

Signed-off-by: Paul Dittamo <[email protected]>

* utilize enum execution mode to set array node execution path

Signed-off-by: Paul Dittamo <[email protected]>

* default execution mode to FULL_STATE for new array node class

Signed-off-by: Paul Dittamo <[email protected]>

* support min_successes for new array node

Signed-off-by: Paul Dittamo <[email protected]>

* add map task wrapper unit test

Signed-off-by: Paul Dittamo <[email protected]>

* set min successes for array node map task wrapper

Signed-off-by: Paul Dittamo <[email protected]>

* update docstrings

Signed-off-by: Paul Dittamo <[email protected]>

* Install flyteidl from master in plugins tests

Signed-off-by: Eduardo Apolinario <[email protected]>

* lint

Signed-off-by: Paul Dittamo <[email protected]>

* clean up min success/ratio setting

Signed-off-by: Paul Dittamo <[email protected]>

* lint

Signed-off-by: Paul Dittamo <[email protected]>

* make array node class callable

Signed-off-by: Paul Dittamo <[email protected]>

---------

Signed-off-by: Paul Dittamo <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Richer printing for some artifact objects (#2624)

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* ci: Add Python 3.9 to build matrix (#2622)

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Future-Outlier <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* bump (#2627)

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Added alt prefix head to FlyteFile.new_remote (#2601)

* Added alt prefix head to FlyteFile.new_remote

Signed-off-by: pryce-turner <[email protected]>

* Added get_new_path method to FileAccessProvider, fixed new_remote method of FlyteFile

Signed-off-by: pryce-turner <[email protected]>

* Updated tests and added new path creator to FlyteFile/Dir new_remote methods

Signed-off-by: pryce-turner <[email protected]>

* Improved docstrings, fixed minor path sep bug, more descriptive naming, better test

Signed-off-by: pryce-turner <[email protected]>

* Formatting

Signed-off-by: pryce-turner <[email protected]>

---------

Signed-off-by: pryce-turner <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Feature gate for FlyteMissingReturnValueException (#2623)

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Remove use of multiprocessing from the OAuth client (#2626)

* Remove use of multiprocessing from the OAuth client

Signed-off-by: Robert Deaton <[email protected]>

* Lint

Signed-off-by: Robert Deaton <[email protected]>

---------

Signed-off-by: Robert Deaton <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Update codespell in precommit to version 2.3.0 (#2630)

Signed-off-by: mao3267 <[email protected]>

* Fix Snowflake Agent Bug (#2605)

* fix snowflake agent bug

Signed-off-by: Future-Outlier <[email protected]>

* a work version

Signed-off-by: Future-Outlier <[email protected]>

* Snowflake work version

Signed-off-by: Future-Outlier <[email protected]>

* fix secret encode

Signed-off-by: Future-Outlier <[email protected]>

* all works, I am so happy

Signed-off-by: Future-Outlier <[email protected]>

* improve additional protocol

Signed-off-by: Future-Outlier <[email protected]>

* fix tests

Signed-off-by: Future-Outlier <[email protected]>

* Fix Tests

Signed-off-by: Future-Outlier <[email protected]>

* update agent

Signed-off-by: Kevin Su <[email protected]>

* Add snowflake test

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* sd

Signed-off-by: Kevin Su <[email protected]>

* snowflake loglinks

Signed-off-by: Future-Outlier <[email protected]>

* add metadata

Signed-off-by: Future-Outlier <[email protected]>

* secret

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* remove table

Signed-off-by: Future-Outlier <[email protected]>

* add comment for get private key

Signed-off-by: Future-Outlier <[email protected]>

* update comments:

Signed-off-by: Future-Outlier <[email protected]>

* Fix Tests

Signed-off-by: Future-Outlier <[email protected]>

* update comments

Signed-off-by: Future-Outlier <[email protected]>

* update comments

Signed-off-by: Future-Outlier <[email protected]>

* Better Secrets

Signed-off-by: Future-Outlier <[email protected]>

* use union secret

Signed-off-by: Future-Outlier <[email protected]>

* Update Changes

Signed-off-by: Future-Outlier <[email protected]>

* use if not get_plugin().secret_requires_group()

Signed-off-by: Future-Outlier <[email protected]>

* Use Union SDK

Signed-off-by: Future-Outlier <[email protected]>

* Update

Signed-off-by: Future-Outlier <[email protected]>

* Fix Secrets

Signed-off-by: Future-Outlier <[email protected]>

* Fix Secrets

Signed-off-by: Future-Outlier <[email protected]>

* remove pacakge.json

Signed-off-by: Future-Outlier <[email protected]>

* lint

Signed-off-by: Future-Outlier <[email protected]>

* add snowflake-connector-python

Signed-off-by: Future-Outlier <[email protected]>

* fix test_snowflake

Signed-off-by: Future-Outlier <[email protected]>

* Try to fix tests

Signed-off-by: Future-Outlier <[email protected]>

* fix tests

Signed-off-by: Future-Outlier <[email protected]>

* Try Fix snowflake Import

Signed-off-by: Future-Outlier <[email protected]>

* snowflake test passed

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* run test_missing_return_value on python 3.10+ (#2637)

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* [Elastic] Fix context usage and apply fix to fork method (#2628)

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Add flytekit-omegaconf plugin (#2299)

* add flytekit-hydra

Signed-off-by: mg515 <[email protected]>

* fix small typo readme

Signed-off-by: mg515 <[email protected]>

* ruff ruff

Signed-off-by: mg515 <[email protected]>

* lint more

Signed-off-by: mg515 <[email protected]>

* rename plugin into flytekit-omegaconf

Signed-off-by: mg515 <[email protected]>

* lint sort imports

Signed-off-by: mg515 <[email protected]>

* use flytekit logger

Signed-off-by: mg515 <[email protected]>

* use flytekit logger #2

Signed-off-by: mg515 <[email protected]>

* fix typing info in is_flatable

Signed-off-by: mg515 <[email protected]>

* use default_factory instead of mutable default value

Signed-off-by: mg515 <[email protected]>

* add python3.11 and python3.12 to setup.py

Signed-off-by: mg515 <[email protected]>

* make fmt

Signed-off-by: mg515 <[email protected]>

* define error message only once

Signed-off-by: mg515 <[email protected]>

* add docstring

Signed-off-by: mg515 <[email protected]>

* remove GenericEnumTransformer and tests

Signed-off-by: mg515 <[email protected]>

* fallback to TypeEngine.get_transformer(node_type) to find suitable transformer

Signed-off-by: mg515 <[email protected]>

* explicit valueerrors instead of asserts

Signed-off-by: mg515 <[email protected]>

* minor style improvements

Signed-off-by: mg515 <[email protected]>

* remove obsolete warnings

Signed-off-by: mg515 <[email protected]>

* import flytekit logger instead of instantiating our own

Signed-off-by: mg515 <[email protected]>

* docstrings in reST format

Signed-off-by: mg515 <[email protected]>

* refactor transformer mode

Signed-off-by: mg515 <[email protected]>

* improve docs

Signed-off-by: mg515 <[email protected]>

* refactor dictconfig class into smaller methods

Signed-off-by: mg515 <[email protected]>

* add unit tests for dictconfig transformer

Signed-off-by: mg515 <[email protected]>

* refactor of parse_type_description()

Signed-off-by: mg515 <[email protected]>

* add omegaconf plugin to pythonbuild.yaml

---------

Signed-off-by: mg515 <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Adds extra-index-url to default image builder (#2636)

Signed-off-by: Thomas J. Fan <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* reference_task should inherit from PythonTask (#2643)

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* Fix Get Agent Secret Using Key (#2644)

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: mao3267 <[email protected]>

* fix: prevent converting Flyte types as custom dataclasses

Signed-off-by: mao3267 <[email protected]>

* fix: add None to output type

Signed-off-by: mao3267 <[email protected]>

* test: add unit test for nested dataclass inputs

Signed-off-by: mao3267 <[email protected]>

* test: add unit tests for nested dataclass, dataclass default value as None, and flyte type exceptions

Signed-off-by: mao3267 <[email protected]>

* fix: handle NoneType as default value of list type dataclass members

Signed-off-by: mao3267 <[email protected]>

* fix: add comments for `has_nested_dataclass` function

Signed-off-by: mao3267 <[email protected]>

* fix: make lint

Signed-off-by: mao3267 <[email protected]>

* fix: update tests regarding input through file and pipe

Signed-off-by: mao3267 <[email protected]>

* Make JsonParamType convert faster

Signed-off-by: Future-Outlier <[email protected]>

* make has_nested_dataclass func more clean and add tests for dataclass_with_optional_fields

Signed-off-by: Future-Outlier <[email protected]>

* make logic more backward compatible

Signed-off-by: Future-Outlier <[email protected]>

* fix: handle indexing errors in dict/list while checking nested dataclass, add comments

Signed-off-by: mao3267 <[email protected]>

---------

Signed-off-by: mao3267 <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Co-authored-by: Future-Outlier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants